.

Chatting with our data

** Lets meet a very nice guy: dplyr **

  1. We are going to need take some variable

  2. and make a new one from it

  3. Perhaps, there are going to be sometimes when we would like to know the possible cases that would come from the combination of two categorical variables and calculate the frequency of each scenario (distribution) and evenresume the information than’d come from it

  4. Another times, we’d just need to order our dataset from some specific variable or variables in order to watch the biggest (or the smallest) ones at above (or at the bottom)

  5. And finally what we’d want to tell our data is that we only want to see those cases (rows) than would fulfill some condition

So, translating …

Quoting from - dplyr’s github:

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

  1. select() picks variables based on their names.

  2. mutate() adds new variables that are functions of existing variables.

  3. summarise() reduces multiple values down to a single summary.

  4. arrange() changes the ordering of the rows.

  5. filter() picks cases based on their values.

#source("script_mns.R")
# library("nycflights13")
# saveRDS(flights,"data/flights.rds")
# saveRDS(airlines,"data/airlines.rds")
# library(tidyverse)
# mpg
flights<-readRDS("data/flights.rds")
# airlines<-readRDS("data/airlines.rds")
flights

select

library(dplyr)
# Drop unimportant variables so it's easier to understand the join results.
flights %>% select(year:day, hour, origin, dest, tailnum, carrier)

mutate

flights %>% mutate(I_tarde=if_else(arr_delay>0,1,0))
#que lindo detalle
flights%>%
  select(sched_arr_time,arr_time,arr_delay)%>%
  mutate(I_tarde=if_else(arr_delay>0,1,0))

arrange

flights%>%arrange(year)
flights%>%
  arrange(year,month,day)
flights%>%
  select(year,month,sched_dep_time,sched_arr_time,arr_time,arr_delay)%>%
  mutate(I_tarde=if_else(arr_delay>0,1,0))%>%
  arrange(desc(month),sched_dep_time,desc(I_tarde))

filter

flights%>%
  filter(sched_dep_time<1200)
flights%>%
  select(year,month,sched_dep_time,sched_arr_time,arr_time,arr_delay)%>%
  mutate(I_tarde=if_else(arr_delay>0,1,0))%>%
  filter(sched_dep_time>=0001 & sched_dep_time<600 & I_tarde==1) 
flights2<-flights%>%
  mutate(I_tarde=if_else(arr_delay>0,1,0))
# is.na(flights2$I_tarde)
sum(is.na(flights2$I_tarde)); print(paste0("The 'NA's represent the: ", round(sum(is.na(flights2$I_tarde))/nrow(flights2),4)*100,"% of the Universe"))
[1] 9430
[1] "The 'NA's represent the: 2.8% of the Universe"
flights2%>%
  filter(is.na(I_tarde))

Option 1: Take them out of here

# flights2[!is.na(flights2$I_tarde),]
flights2_opt1<-flights2%>%
  filter(!is.na(I_tarde))
flights2_opt1

Option 2: Special Value

flights2_opt2<-flights2%>%
  mutate(I_tarde=if_else(is.na(I_tarde),-9999,I_tarde))
flights2_opt2; sum(is.na(flights2_opt2$I_tarde)); print(paste0("The 'NA's represent the: ", round(sum(is.na(flights2_opt2$I_tarde))/nrow(flights2_opt2),4)*100,"% of the Universe")); sum(flights2_opt2$I_tarde==-9999); print(paste0("The Special Values represent the: ", round(sum(flights2_opt2$I_tarde==-9999)/nrow(flights2_opt2),4)*100,"% of the Universe"))
[1] 0
[1] "The 'NA's represent the: 0% of the Universe"
[1] 9430
[1] "The Special Values represent the: 2.8% of the Universe"

Option N & beyond: More sophisticated stuff

print(important_mens)
[1] " We won't see any of these :D   "

Grouped operations

group_by

  • Counting number of flights per day

Two-table

Let’s go for some joins

airlines<-readRDS("data/airlines.rds")
airlines
  • Getting the Airlines’ names

Distribution of flights by Airline for Profe Lalo’s homework

References

Cheatsheet

LS0tCnRpdGxlOiAiIgpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazoKICAgIGRmX3ByaW50OiBwYWdlZAogIGh0bWxfZG9jdW1lbnQ6CiAgICBkZl9wcmludDogcGFnZWQKLS0tCgojIDxhPjxpbWcgc3JjPSd3d3cvYmJ2YV91YWR5MDIucG5nJyBhbGlnbj0icmlnaHQiIGhlaWdodD0iMTM5IiAvPjwvYT4KICAgLiA8YnIvPgoKI0NoYXR0aW5nIHdpdGggb3VyIGRhdGEKCioqIExldHMgbWVldCBhIHZlcnkgbmljZSBndXk6ICpgZHBseXJgKiAqKgoKLSBNYXliZSwgdGhlIG1vc3Qgb2Ygb3VyIHRpbWUsIHdpbGwgYmUgc3BlbnQgb24gYHRhbGtpbmcgdG8gb3VyIGRhdGFgCgotIE9uZSBvZiB0aGUgbW9zdCBpbXBvcnRhbnQgdHlwZSBvZiB3b3JkcyBvbiBvdXIgbGFuZ3VhZ2VzIGFyZSB0aGUgYHZlcmJzYAoKLSBUaGlzIG1ha2Ugc2Vuc2UgYmVjYXVzZSB0aGV5IGBleGVjdXRlIGFjdGlvbnNgLCBhbmQgdGhhdCBpcyB3aGF0IHdlJ2xsIHdhbnQKCjEuIFdlIGFyZSBnb2luZyB0byBuZWVkIGB0YWtlYCBzb21lICp2YXJpYWJsZSogCgoyLiBhbmQgYG1ha2VgIGEgbmV3IG9uZSBmcm9tIGl0CgozLiBQZXJoYXBzLCB0aGVyZSBhcmUgZ29pbmcgdG8gYmUgc29tZXRpbWVzIHdoZW4gd2Ugd291bGQgbGlrZSB0byBrbm93IHRoZSBwb3NzaWJsZSBjYXNlcyB0aGF0IHdvdWxkIGNvbWUgZnJvbSB0aGUgY29tYmluYXRpb24gb2YgdHdvIGNhdGVnb3JpY2FsIHZhcmlhYmxlcyBhbmQgY2FsY3VsYXRlIHRoZSBmcmVxdWVuY3kgb2YgZWFjaCBzY2VuYXJpbyAoYGRpc3RyaWJ1dGlvbmApIGFuZCBldmVuYCByZXN1bWVgIHRoZSBpbmZvcm1hdGlvbiB0aGFuJ2QgY29tZSBmcm9tIGl0Cgo0LiAgQW5vdGhlciB0aW1lcywgd2UnZCBqdXN0IG5lZWQgdG8gYG9yZGVyYCBvdXIgZGF0YXNldCBmcm9tIHNvbWUgc3BlY2lmaWMgdmFyaWFibGUgb3IgdmFyaWFibGVzIGluIG9yZGVyIHRvIHdhdGNoIHRoZSBgYmlnZ2VzdGAgKG9yIHRoZSBgc21hbGxlc3RgKSBvbmVzIGF0IGFib3ZlIChvciBhdCB0aGUgYm90dG9tKQoKNS4gQW5kIGZpbmFsbHkgd2hhdCB3ZSdkIHdhbnQgdG8gKnRlbGwqIG91ciBkYXRhIGlzIHRoYXQgd2Ugb25seSB3YW50IHRvIHNlZSB0aG9zZSBjYXNlcyAoKnJvd3MqKSB0aGFuIHdvdWxkIGZ1bGZpbGwgc29tZSBgY29uZGl0aW9uYAoKCiMjIFNvLCB0cmFuc2xhdGluZyAuLi4KCioqUXVvdGluZyBmcm9tIC0gWypkcGx5cidzIGdpdGh1YipdKGh0dHBzOi8vZ2l0aHViLmNvbS90aWR5dmVyc2UvZHBseXIvYmxvYi9tYXN0ZXIvUkVBRE1FLm1kKToqKgoKZHBseXIgaXMgYSBncmFtbWFyIG9mIGRhdGEgbWFuaXB1bGF0aW9uLCBwcm92aWRpbmcgYSBjb25zaXN0ZW50IHNldCBvZgp2ZXJicyB0aGF0IGhlbHAgeW91IHNvbHZlIHRoZSBtb3N0IGNvbW1vbiBkYXRhIG1hbmlwdWxhdGlvbiBjaGFsbGVuZ2VzOgoKICAKMS4gYHNlbGVjdCgpYCBwaWNrcyB2YXJpYWJsZXMgYmFzZWQgb24gdGhlaXIgbmFtZXMuCgoyLiBgbXV0YXRlKClgIGFkZHMgbmV3IHZhcmlhYmxlcyB0aGF0IGFyZSBmdW5jdGlvbnMgb2YgZXhpc3RpbmcKdmFyaWFibGVzLgoKMy4gYHN1bW1hcmlzZSgpYCByZWR1Y2VzIG11bHRpcGxlIHZhbHVlcyBkb3duIHRvIGEgc2luZ2xlIHN1bW1hcnkuCgo0LiBgYXJyYW5nZSgpYCBjaGFuZ2VzIHRoZSBvcmRlcmluZyBvZiB0aGUgcm93cy4KCjUuIGBmaWx0ZXIoKWAgcGlja3MgY2FzZXMgYmFzZWQgb24gdGhlaXIgdmFsdWVzLgoKCgoKCgpgYGB7cn0KI3NvdXJjZSgic2NyaXB0X21ucy5SIikKIyBsaWJyYXJ5KCJueWNmbGlnaHRzMTMiKQojIHNhdmVSRFMoZmxpZ2h0cywiZGF0YS9mbGlnaHRzLnJkcyIpCiMgc2F2ZVJEUyhhaXJsaW5lcywiZGF0YS9haXJsaW5lcy5yZHMiKQojIGxpYnJhcnkodGlkeXZlcnNlKQojIG1wZwoKYGBgCgpgYGB7cn0KZmxpZ2h0czwtcmVhZFJEUygiZGF0YS9mbGlnaHRzLnJkcyIpCmZsaWdodHMKYGBgCgojIyMgKipgc2VsZWN0YCoqCmBgYHtyfQpsaWJyYXJ5KGRwbHlyKQojIERyb3AgdW5pbXBvcnRhbnQgdmFyaWFibGVzIHNvIGl0J3MgZWFzaWVyIHRvIHVuZGVyc3RhbmQgdGhlIGpvaW4gcmVzdWx0cy4KZmxpZ2h0cyAlPiUgc2VsZWN0KHllYXI6ZGF5LCBob3VyLCBvcmlnaW4sIGRlc3QsIHRhaWxudW0sIGNhcnJpZXIpCmBgYAoKCiMjIyAqKmBtdXRhdGVgKioKYGBge3J9CmZsaWdodHMgJT4lIG11dGF0ZShJX3RhcmRlPWlmX2Vsc2UoYXJyX2RlbGF5PjAsMSwwKSkKYGBgCgpgYGB7cn0KI3F1ZSBsaW5kbyBkZXRhbGxlCmZsaWdodHMlPiUKICBzZWxlY3Qoc2NoZWRfYXJyX3RpbWUsYXJyX3RpbWUsYXJyX2RlbGF5KSU+JQogIG11dGF0ZShJX3RhcmRlPWlmX2Vsc2UoYXJyX2RlbGF5PjAsMSwwKSkKYGBgCgojIyMgKipgYXJyYW5nZWAqKgpgYGB7cn0KZmxpZ2h0cyU+JWFycmFuZ2UoeWVhcikKYGBgCmBgYHtyfQpmbGlnaHRzJT4lCiAgYXJyYW5nZSh5ZWFyLG1vbnRoLGRheSkKYGBgCmBgYHtyfQpmbGlnaHRzJT4lCiAgc2VsZWN0KHllYXIsbW9udGgsc2NoZWRfZGVwX3RpbWUsc2NoZWRfYXJyX3RpbWUsYXJyX3RpbWUsYXJyX2RlbGF5KSU+JQogIG11dGF0ZShJX3RhcmRlPWlmX2Vsc2UoYXJyX2RlbGF5PjAsMSwwKSklPiUKICBhcnJhbmdlKGRlc2MobW9udGgpLHNjaGVkX2RlcF90aW1lLGRlc2MoSV90YXJkZSkpCmBgYAoKIyMjICoqYGZpbHRlcmAqKgpgYGB7cn0KZmxpZ2h0cyU+JQogIGZpbHRlcihzY2hlZF9kZXBfdGltZTwxMjAwKQpgYGAKCmBgYHtyfQpmbGlnaHRzJT4lCiAgc2VsZWN0KHllYXIsbW9udGgsc2NoZWRfZGVwX3RpbWUsc2NoZWRfYXJyX3RpbWUsYXJyX3RpbWUsYXJyX2RlbGF5KSU+JQogIG11dGF0ZShJX3RhcmRlPWlmX2Vsc2UoYXJyX2RlbGF5PjAsMSwwKSklPiUKICBmaWx0ZXIoc2NoZWRfZGVwX3RpbWU+PTAwMDEgJiBzY2hlZF9kZXBfdGltZTw2MDAgJiBJX3RhcmRlPT0xKSAKYGBgCgpgYGB7cn0KZmxpZ2h0czI8LWZsaWdodHMlPiUKICBtdXRhdGUoSV90YXJkZT1pZl9lbHNlKGFycl9kZWxheT4wLDEsMCkpCgojIGlzLm5hKGZsaWdodHMyJElfdGFyZGUpCgoKc3VtKGlzLm5hKGZsaWdodHMyJElfdGFyZGUpKTsgcHJpbnQocGFzdGUwKCJUaGUgJ05BJ3MgcmVwcmVzZW50IHRoZTogIiwgcm91bmQoc3VtKGlzLm5hKGZsaWdodHMyJElfdGFyZGUpKS9ucm93KGZsaWdodHMyKSw0KSoxMDAsIiUgb2YgdGhlIFVuaXZlcnNlIikpCmBgYAoKYGBge3J9CmZsaWdodHMyJT4lCiAgZmlsdGVyKGlzLm5hKElfdGFyZGUpKQpgYGAKCipPcHRpb24gMTogVGFrZSB0aGVtIG91dCBvZiBoZXJlKgpgYGB7cn0KIyBmbGlnaHRzMlshaXMubmEoZmxpZ2h0czIkSV90YXJkZSksXQoKZmxpZ2h0czJfb3B0MTwtZmxpZ2h0czIlPiUKICBmaWx0ZXIoIWlzLm5hKElfdGFyZGUpKQoKZmxpZ2h0czJfb3B0MQpgYGAKCipPcHRpb24gMjogU3BlY2lhbCBWYWx1ZSoKYGBge3J9CmZsaWdodHMyX29wdDI8LWZsaWdodHMyJT4lCiAgbXV0YXRlKElfdGFyZGU9aWZfZWxzZShpcy5uYShJX3RhcmRlKSwtOTk5OSxJX3RhcmRlKSkKCmZsaWdodHMyX29wdDI7IHN1bShpcy5uYShmbGlnaHRzMl9vcHQyJElfdGFyZGUpKTsgcHJpbnQocGFzdGUwKCJUaGUgJ05BJ3MgcmVwcmVzZW50IHRoZTogIiwgcm91bmQoc3VtKGlzLm5hKGZsaWdodHMyX29wdDIkSV90YXJkZSkpL25yb3coZmxpZ2h0czJfb3B0MiksNCkqMTAwLCIlIG9mIHRoZSBVbml2ZXJzZSIpKTsgc3VtKGZsaWdodHMyX29wdDIkSV90YXJkZT09LTk5OTkpOyBwcmludChwYXN0ZTAoIlRoZSBTcGVjaWFsIFZhbHVlcyByZXByZXNlbnQgdGhlOiAiLCByb3VuZChzdW0oZmxpZ2h0czJfb3B0MiRJX3RhcmRlPT0tOTk5OSkvbnJvdyhmbGlnaHRzMl9vcHQyKSw0KSoxMDAsIiUgb2YgdGhlIFVuaXZlcnNlIikpCmBgYAoKKk9wdGlvbiBOICYgYmV5b25kOiBNb3JlIHNvcGhpc3RpY2F0ZWQgc3R1ZmYqCmBgYHtyfQojIHByaW50KGltcG9ydGFudF9tZW5zKQpgYGAKCiMjIEdyb3VwZWQgb3BlcmF0aW9ucwoKIyMjICoqYGdyb3VwX2J5YCoqCgotIENvdW50aW5nIG51bWJlciBvZiBmbGlnaHRzIHBlciBkYXkKCmBgYHtyfQpmbGlnaHRzMl9vcHQyJT4lCiAgZ3JvdXBfYnkoeWVhcixtb250aCxkYXkpJT4lCiAgbXV0YXRlKG5fZmxpZ2h0cz1uKCkpCmBgYAoKCmBgYHtyfQpmbGlnaHRzMl9vcHQyJT4lCiAgZ3JvdXBfYnkoeWVhcixtb250aCxkYXkpJT4lCiAgbXV0YXRlKG5fZmxpZ2h0cz1uKCkjLAogICAgICAgICAjaWRyb3c9cm93X251bWJlcigpCiAgICAgICAgICklPiUKICBhcnJhbmdlKHllYXIsbW9udGgsZGF5KSU+JQogIHNlbGVjdCh5ZWFyLG1vbnRoLGRheSxuX2ZsaWdodHMpJT4lCiAgZmlsdGVyKHJvd19udW1iZXIoKT09MSkKICAKYGBgCmBgYHtyfQpmbGlnaHRzMl9vcHQyJT4lCiAgZ3JvdXBfYnkoeWVhcixtb250aCxkYXkpJT4lCiAgbXV0YXRlKG5fZmxpZ2h0cz1uKCkjLAogICAgICAgICAjaWRyb3c9cm93X251bWJlcigpCiAgICAgICAgICklPiUKICBhcnJhbmdlKHllYXIsbW9udGgsZGF5KSU+JQogIAogICMgc2VsZWN0KHllYXIsbW9udGgsZGF5LG5fZmxpZ2h0cyklPiUKICBmaWx0ZXIocm93X251bWJlcigpPT0xKSU+JQogIHVuZ3JvdXAoKSU+JQogIHNlbGVjdChhcnJfZGVsYXksbl9mbGlnaHRzKQogIApgYGAKCgpgYGB7cn0KZmxpZ2h0czJfb3B0MiU+JQogIGdyb3VwX2J5KHllYXIsbW9udGgsZGF5KSU+JQogIHN1bW1hcmlzZShuX2ZsaWdodHM9bigpKQogIApgYGAKCgpgYGB7cn0KYWlybGluZXM8LXJlYWRSRFMoImRhdGEvYWlybGluZXMucmRzIikKYWlybGluZXMKYGBgCgojIyBUd28tdGFibGUKW0xldCdzIGdvIGZvciBzb21lIGpvaW5zXShodHRwczovL2RwbHlyLnRpZHl2ZXJzZS5vcmcvYXJ0aWNsZXMvdHdvLXRhYmxlLmh0bWwpCgpgYGB7cn0KYWlybGluZXM8LXJlYWRSRFMoImRhdGEvYWlybGluZXMucmRzIikKYWlybGluZXMKYGBgCgotIEdldHRpbmcgdGhlIEFpcmxpbmVzJyBuYW1lcwpgYGB7cn0KZmxpZ2h0c193X25hbWVzPC1mbGlnaHRzMl9vcHQyJT4lCiAgbGVmdF9qb2luKGFpcmxpbmVzLGJ5ID0gYygiY2FycmllciIgPSAiY2FycmllciIpKQoKZmxpZ2h0c193X25hbWVzCmBgYAoKIyMgRGlzdHJpYnV0aW9uIG9mIGZsaWdodHMgYnkgQWlybGluZSBmb3IgUHJvZmUgTGFsbydzIGhvbWV3b3JrCgpgYGB7cn0KZmxpZ2h0c193X25hbWVzJT4lCiAgZ3JvdXBfYnkobmFtZSklPiUKICBzdW1tYXJpc2UobnVtX3Z1ZWxvcz1uKCksCiAgICAgICAgICAgIHBvcmNfdnVlbG9zPXJvdW5kKG51bV92dWVsb3MvbnJvdyguKSw0KQogICAgICAgICAgICApJT4lCiAgYXJyYW5nZShkZXNjKHBvcmNfdnVlbG9zKSkKYGBgCgoKCgojUmVmZXJlbmNlcwoKLSBbR28gdG8gaGltXShodHRwczovL2dpdGh1Yi5jb20vdGlkeXZlcnNlL2RwbHlyL2Jsb2IvbWFzdGVyL1JFQURNRS5tZCkKCi0gW1RoZSBiZXN0IHBsYWNlIHRvIHN0YXJ0XShodHRwczovL3I0ZHMuaGFkLmNvLm56L3RyYW5zZm9ybS5odG1sKQoKIyMgQ2hlYXRzaGVldAoKPGEgaHJlZj0iaHR0cHM6Ly9naXRodWIuY29tL3JzdHVkaW8vY2hlYXRzaGVldHMvYmxvYi9tYXN0ZXIvZGF0YS10cmFuc2Zvcm1hdGlvbi5wZGYiPjxpbWcgc3JjPSJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vcnN0dWRpby9jaGVhdHNoZWV0cy9tYXN0ZXIvcG5ncy90aHVtYm5haWxzL2RhdGEtdHJhbnNmb3JtYXRpb24tY2hlYXRzaGVldC10aHVtYnMucG5nIiB3aWR0aD0iNjMwIiBoZWlnaHQ9IjI1MiIvPjwvYT4gIAoKCgo=